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1.  INTRODUCTION 


The  piecewise  linear  layered  neural  network  is  a  simple  computing  device  with 
potential  for  implementing  pattern  recognition  and  image  processing  algorithms.  Questions 
regarding  mapping  capabilities  and  weight  assignment  for  these  networks  lead  to  problems 
in  combinatorial  and  computational  geometry,  due  to  the  discrete  and  essentially  linear 
nature  of  the  piecewise  linear  neuron  transfer  timction. 

This  report  discusses  some  specific  techniques  and  results  for  piecewise  linear 
networks  (PLNs).  The  basic  problems  motivating  this  discussion  concern  the  need  to 
design  networks  and  assign  their  weights  in  order  to  obtain  network  transformations  which 
map  specified  input  vectors  into  specified  output  vectors.  For  example  one  ro&y  have  a 
sample  prototype  in  d-dimensional  space  for  each  of  N  pattern  classes.  The  objective  may 

then  be  to  map  the  ith  prototype  xi  into  a  specified  m-dimensional  vector  yi,  for  1  <  i  <  N. 

Given  d,  m,  N,  and  the  N  pairs  (xi,  yO,  how  does  one  determine  the  number  of 
hidden  layers  and  their  dimensions  for  a  suitable  layered  network?  This  is  the  network 
design  problem.  Given  a  network  of  specified  type,  how  does  one  then  determine  a  set  of 
weights  that  will  map  Xi  into  yi  for  all  i's?  The  second  problem  regards  weight  assignment. 
Recent  work  in  weight  assignment  has  focused  largely  on  iterative  dgorithms  ^e  back 
propagation.  Our  focus  is  on  methods  for  weight  initialization  and  assignment  which  avoid 
costly  iterative  procedures. 

References  1  and  2  establish  relationships  between  the  dimensions  of  the  network 
layers  and  the  numbers  of  pairs  which  can  be  accommodated  References  2  through  4  give 
examples  of  noniterative  weight  assignment  procedures.  The  methods  of  References  2  and 
3  employ  linear  algebraic  techniques  which  are  effective  for  large  classes  of  well-behaved 
sigmoidal  neuron  transfer  functions.  The  method  of  Reference  4  utilizes  convexity 
properties  as  well  as  affine  geometry,  and  applies  only  to  the  piecewise  linear  neuron 
transfer  function. 

Basic  results  from  linear  algebra  and  convexity  can  be  found  in  References  5  and  6. 
The  fundamentals  of  combinatorial  and  computational  geometry  are  presented  in  References 
7  and  8.  Separation  and  mapping  capabilities  of  layered  networks  are  discussed  in 
References  9  through  12.  Reference  13  contains  the  fundamental  material  on 
multidimensional  order  types. 

Section  2  presents  basic  definitions  and  notation.  Several  concepts  from  geometric 
complexity  are  discussed  in  Section  3.  These  include  the  interior  relation  (INT^, 
dichotomies  and  decomposition  by  hypeiplanes.  A  construction  for  (d,2,m)  mappings  is 
also  given.  Section  4  contains  two  theorems  pertaining  to  order  modification  by  PLNs,  as 
well  as  two  examples  of  (2,2,2,2)  PLN  mappings  on  sets  of  five  planar  points. 
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2.  DEFINITIONS  AND  NOTATION 


All  patterns  reside  in  real  affine  spaces.  The  layer-to-layer  mappings  are 
compositions  of  affine  transformations  and  the  coordinate-wise  neuron  transfer  function. 
Unless  stated  otherwise,  we  assume  throughout  that  the  neuron  transfer  (squashing) 
function  is  the  piecewise  linear  function  p,  defined  by 


-\fort<-\ 

p{t)=  t for-\<t<\  '  • 
1  for  1  <  / 


The  function  p  is  extended  to  vectors  in  a  coordinate-wise  fashion.  That  is, 


Pi.x)  =  (p(x{),p(x2\...,p(xd)) 


where 


^  =  . Xd)  • 


R(d)  denotes  d-dimensional  real  affine  space  while  !(*!)  denotes  the  d-dimensional  real  cube 
[-1,1  ](<!).  The  input  set  X  and  the  desired  output  set  Y  are  assumed  to  be  in  general  position 

in  R(d)  and  respectively.  An  (Lo,Li,...,LK,LK+i)-network  is  a  feed-forward  layered 
network  with 

input  dimension  =  d  =  Lq 
output  dimension  =  m  =  Lk+i 


and 


K  hidden  layers  with  dimensions  =  Lj,  1  <  j  <  K. 


The  nodes  in  layer  j  are  forward  connected  to  those  in  layer  j+1  for  0  <  j  <  K.  For 
economy  of  notation  we  let 
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=  (Lq,L[,Lq^ . Lf',L^+\)  . 


We  say  that  L*  accommodates  an  intega*  N,  if  for  every  pair  (X,^Y)  of  N  distinct 
inputs  and  N  desired  outputs  there  exists  a  weight  assignment  for  an  L*-network  which 
effects  the  mapping  xi  -»  yi,  for  1  <  i  <  N.  Here  the  sets  X  and  Y  are  assumed  to  be  in 
general  position  in  and  respectively.  NmaxCL  )  denotes  the  largest  integer  N 
which  is  accommodated  by  L*. 

Among  the  N-sets  of  input/output  pairs  are  those  whose  output  sets  lie  in  the  interior 
of  the  cube  Any  mapping  that  accommodates  such  a  set  makes  no  use  of  the  piecewise 
linear  truncation  in  the  output  space.  Thus,  NmM  the  same  value  without  the 
'squ&sh'  as  with  it.  Therefore,  we  will  usually  oniit  the  application  of  the  function  p  at  the 
ou^ut  layer. 

The  mapping  from  layer  j  to  j+1  is  given  by 


Aj(z)  =  p(AjZ+bj) 

where  Aj  is  Lj+i  by  Lj,  and  b  is  Lj+i  by  1. 

The  total  number  of  weights  available  in  a  network  of  type  L*  is  denoted  by  Wgt(L*) 
and  is  given  by 

WgKL*  )  =  (d  +  DLi  +  (Li  +  1)L2+-+(^X--1  +  +  i^K  + 1)"» 


Ndim(L*)  is  an  upper  bound  for  Nmax(L*).  established  in  Reference  1,  and  given  by 


A  subset  C  of  R(<i)  is  called  convex  provided  ^ici  +  X,2C2  C,  whenever  ci  e  C, 

C2  e  C,  A,i  >  0,  X,2  >  0,  and  Xi  +  X,2  =  1.  Equivalently  C  is  convex  if,  and  only  if,  C  is 
closed  under  convex  combinations.  If  C  is  topologically  closed  and  convex,  the  bound^ 
of  C,  denoted  Bound(C),  is  the  topological  boundary  of  C  in  the  topology  of  Aff(Q. 
Aff(C)  denotes  the  affine  closure  of  C,  i.e.  the  smallest  affme  subspace  of  R(d)  conta^g 
C.  The  interior  of  C,  denoted  Int(C),  is  just  CNBound(C).  A  point  c  is  an  extreme  point  of 

C  whenever  there  exists  a  hypeiplane  H  in  Aff(C)  for  which  H  fl  C  =  {c},  and  H  does  not 
separate  C.  Note  that  if  Aff(C)  is  k-dimensional,  then  a  hyperplane  H  in  Aff(C)  must  be  a 
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(k-l)-dimensional  affine  subspace  of  which  lies  in  Aff(C).  The  set  of  extreme  points 
of  C  is  denoted  Ext(C). 


EXAMPLE  1 

Let  d  =  3,  and  let 

C  =  |(ci ,  C2 ,  C3 ):  +  cl  ^  1  and  C3  =  l| 


The  topological  boundary  of  C  in  is  C.  Since  Aff(C)  is  the  hyperplane  {(xi,  X2,  X3) : 
X3  =  1 },  we  have 


Bound(C)  =  {(ci,  C2, 1) :  Cj+  =  1 }, 
Int(C)  =  { (ci,  C2, 1) :  c j+  C2  <  1 }  , 


and 


Ext(C)  =  Bound  (C). 

Therefore,  Bound(C)  is  the  circle,  Int(C)  is  the  open  disk,  and  the  set  of  extreme  points  is 
also  the  circle. 

The  last  equality  does  not  generally  hold.  Closed  convex  polytopes  in  R(^)  have 
(d-l)-dimensional  boundaries,  but  only  finite  0-dimensional  sets  of  extreme  points,  which 
are  called  vertices. 


EXAMPLE  2 

Let  d  =  3,  and  let 


C  =  { (ci,  C2,  C3) :  all  Ci  ^  0  and  ci  +  C2  +  C3  =  1 }  . 


C  is  just  the  2-simplex  embedded  in  R(3).  In  this  case 


BoundCQ  =  {(ci,  C2,  C3)  €  C :  some  Cj  =  0}. 
Int(C)  =  { (ci,  C2,  C3)  e  C  :  all  Ci  >  0}, 
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and 


Ext(C)  =  {ci,  C2,  C3)  e  C :  some  Ci  =  1 }  . 


In  this  example  Bound(C)  is  the  one-dimensional  boundary  of  the  triangle,  while  Ext(C) 
consists  only  of  the  three  vertices. 

For  X  C  the  convex  closure  of  X,  denoted  Conv(X)  or  Hull(X),  is  the 
intersection  of  all  convex  sets  which  contain  X.  Let  C(X)  =  Conv  (X),  the  topological 
closure  of  Conv(X)  in  We  define  Bound(X),  Int(X),  and  Ext(X)  in  terms  of  the 
closed  convex  set  C(X): 


Bound(X)  =  X  n  Bound(C(X)), 
Int(X)=Xn  Int(C(X)), 


and 


Ext(X)=Xn  Ext(C(X)). 

EXAMPLE  3 

Let  d  =  2,  and  define  X,  a  set  of  nine  planar  lattice  points,  by 


X  =  {(xi,  X2) :  -1  <  xi  <  1  and  xi  an  integer,  i  =  1, 2} 


Then 

Bound(X)  =  {(-1,  -1),  (-1,  0),  (-1, 1),  (0,  -1),  (0, 1),  (1,  -1),  (1,0),(1,  1)} 
Int(X)  =  {(0,  0)} 


Ext(X)  =  {(-1,  -1),  (-1,  1),  (1,  -1),  (1,  1)}  . 

X  has  four  vertices,  four  other  boundary  points,  and  one  interior  point 

In  this  report  we  will  be  primarily  interested  in  finite  sets  X  in  general  position  in 
R(d).  For  such  sets.  Bounds  =  Ext(X)  =  the  set  of  vertices  of  the  polytope  Conv(X)  and 
Int(X)  consists  of  all  remaining  points  of  X. 
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The  following  nest  of  subsets  is  associated  with  each  subset  X  C 

Ext(X)  C  Bound(X)  s:  X  £  Conv(X). 

Suppose  X  and  Y  satisfy  the  following: 

X={xi:l<i<N}cR(‘l), 

Y={yi:l<i<N}cR(e) 

where 

yi  =  A+(xO  ,  1  <  i  <  N  , 


and  A+  :  Rf^*)  R(®)  is  an  affine  transformation.  Then  x  e  Bound(X)  whenever  y  e 

Bound(Y),  i.e. 

Bound(A+(X))  A+  (Bound(X)) 

If  A+  is  injective  (1-1),  then  equality  holds. 

Table  1  contains  the  coefficients  of  two  affine  mappings  aJ”  and  A^  from  R(2)  to  Rf^). 
A^  is  bijective  but  a|  is  not.  Table  2  gives  the  coordinates  of  three  4-sets  X,  Y,  and  Z  in 

R(2),  satisfying  Y  =  A|(X),  Z  =  A^(X).  Figure  1  shows  the  three  4-sets  in  Rf^).  a|  is  not 
bijective,  and  the  boundary  point  X3  in  X  goes  to  an  interior  point  y3.  The  second  mapping 
A^  is  bijective,  and  the  boundary  {xi,  X2,  X3}  maps  onto  the  boundary  {zi,  Z2,  Z3}. 


TABLE  1.  Two  Affine  Mappings. 
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TABLE  2.  Three  Planar  4-Sets. 


X 


FIGURE  1.  Three  Planar  4-Sets. 


When  N  >d  >  e,  and  X  is  in  general  position  in  RW),  then  there  always  exists  a  linear 
mapping  A,  with  maximal  rank  e,  such  that  Bound  (A(X))  ^  A(Bound(X)).  Indeed,  for 
any  xo  e  Ext(X),  A  can  be  selected  so  that  A(xo)  is  an  interior  point  of  A(X).  To  prove  this 
we  select  a  point  c  in  Conv(X)  such  that  XU  {c}  is  in  general  position,  and  let 
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ho  =  XQ  -  c.  Since  ho  ^  0,  the  subspace  H  =  h  q  ,  perpendicular  to  ho,  is  a  hyperspace  in 

RW.  Therefore,  dim(H)  =  d-1  ^  e,  so  we  may  choose  a  set  {hi,  h2, ...,  hg}  of  e  linearly 
independent  vectors  in  H.  The  desired  transformation  A  is  then  defined  by 


A(x)  =  [hi,h2, 


X 


Clearly  A  maps  R(^)  to  R(®)  and  has  maximal  rank  e.  Moreover,  since  A(xo)  =  A(c),  A(xo) 
must  be  an  interior  point  of  A(X).  Selected  perturbations  of  c  and  the  hj's  allow  A  to  be 
defined  so  that  A(X)  is  in  general  position  in  Rl®). 


3.  PROJECTIONS 


The  piecewise  linear  function,  although  quite  simple  to  define  and  visualize,  delivers 
considerable  complexity  when  composed  with  affine  mappings  and  iterated.  The  space  of 
affine  mappings  is  closed  under  composition  and  preserves  certain  convexity  properties  of 
subsets  of  tile  domain.  In  particular,  the  relations 


(INT)  xie  Int  ({xi,  X2, ...,  xn}) 


are  preserved  by  injective  linear  mappings.  The  modification  of  these  properties  provides 
the  basis  for  the  increased  capabilities  of  nonlinear  networks. 

Consider,  for  example,  the  (l,L,l)-network.  If  the  neuron  transfer  function  is  linear, 
then  Ae  network  transfer  function  preserves  the  (INT)  relation.  In  R(l)  this  means  that  the 
function  must  be  monotone.  By  contrast  the  (1,L,1)-PLN  can  produce  up  to  L-1  local 
maxima  and  L-1  local  minima.  This  imposes  an  upper  bound  of  2L  on  Nmax  (1,L,1). 
Surprisingly  Nmax  (1,L,1)  is  actually  equal  to  2L  (Reference  12).  For  more  general 
squashing  functions  one  can  accommodate  L+1  pairs  using  straight  forward  linear  algebraic 
techniques  (References  3  and  4).  It  is  also  known  that  2L-1  pairs  can  be  approximated 
using  smooth  sigmoid  (Reference  12).  Thus,  for  the  general  sigmoid  there  is  quite  a  gap 
between  the  number  of  pairs  that  can  be  accommodated  exactly  and  the  number  that  can  be 
approximated  using  known  methods. 

Comparing  the  (1,L,1)-PLN  to  the  perception  is  perhaps  more  interesting.  In 
perceptions  the  neurons  employ  the  threshold  transfer  function  T,  defined  by 


nt)  = 


-lfort<0'] 

l/or0<rj 


10 


NAWCWPNS  TP  8217  Revision  1 


Realistic  comparison  requires  criteria  other  than  bounds  for  Nmax  •  The  outputs  at  each 
hidden  layer  of  a  perceptron  all  lie  at  the  vertices  of  the  cube.  Ttiis  greatly  restricts  the 
possible  output  sets  for  a  perceptron.  In  particular,  the  set  of  outputs  Y  must  all  lie  in  the 
boundary  of  Conv(Y),  i.e.  Bound  (Y)  =  Y.  This  means  that  no  set  of  pairs  can  be 
accommodated  by  a  perceptron  when  Int(Y)  is  non-empty.  Thus,  for  the  (d,L,m)- 
perceptron,  Nmax  ^  m  +  1. 

Another  measure  of  capability  often  applied  to  families  of  real-valued  functions  is 
derived  from  the  notion  of  realizable  dichotomies  (References  9  and  12).  A  set  of  points  is 
accommodated  by  a  network  provided  all  dichotomies  are  realizable.  Alternatively  such  a 
set  is  said  to  be  shattered  by  Ae  network.  A  dichotomy  of  X  is  just  a  decomposition  of  X 
into  two  disjoint  subsets  Xi,  X2.  The  dichotomy  {Xi,  X2}  is  realizable  by  a  family  F  of 
real-valued  functions  provided  Aere  exists  f  e  F  satisfymg: 


f(xi)  <  0  <  f(x2)  whenever  xi  e  Xi  and  X2  e  X2 , 


or 


f(x2)  <  0  <  f(xi)  whenever  xi  e  Xi  and  X2  e  X2 . 


An  mteger  N  is  now  said  to  be  accommodated  by  F  provided  every  set  of  N  points  is 
shattered  by  F.  The  maximum  value  of  N  Aat  can  be  accommodated,  m  this  sense,  by  a 
family  of  functions  is  denoted  by  Ndich-  In  this  setting,  at  most  L+1  points  can  be 
accommodated  m  general  by  a  (l,L,l)-perceptron.  Suppose  X  is  an  (L+2)-subset  of  R. 
Let 


X  =  {xi,  X2, ...,  xl+2}  , 


where 

XI  <  X2  <  ....  <  xl+2  • 


Now  let  Xi  =  {X2k+i :  1  ^  2k+l  <  L+2},  and  X2  =  {x2k :  2  <  2k  <  L+2}.  The  components 
Xi  and  X2  of  Ae  dichotomy  are  mterleaved  on  Ae  line.  Every  mtervd  (xi,  xi+i)  must  be 
cut  by  one  of  Ae  hidden  neurons.  Smce  Acre  are  L+1  intervals  and  only  L  neurons,  this  is 
not  possible. 

Usmg  realizable  Achotomies  for  measuring  network  mapping  capability  yields 


^dich  ~ 


L  +  1  for  (1,L,1)  -  perceptrons 
2Lfor(l,L,l)-PLNs 


This  comparison  shows  a  factor  of  2  increase  in  capability  of  Ae  (1,L,1)-PLN  over  the 
(l,L,l)-perceptron.  This  type  of  comparison  is  treated  in  more  detail  m  Reference  12. 
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The  network  function  for  a  PLN  is  piecewise  affine.  That  is,  for  any  weight 
assignment  W,  there  is  a  decomposition  of  the  domain  into  convex  sets,  with  disjoint 
interiors,  on  each  of  which  the  network  function  Fw  is  affine.  Table  3  shows  the  weights 
for  a  (2,3,1)-PLN.  The  decomposition  of  the  domain  into  19  'affine  pieces'  is  shown  in 
Figure  2.  Figure  3  shows  the  21  pieces  which  result  when  the  squashing  function  is  also 
applied  in  the  output  space.  For  a  (2d.,l)-netwoik  the  number  of  distinct  affine  regions  in 
R(2)  can  be  as  great  as  2L2  +  1  without  squashing  in  the  output  space. 


TABLE  3.  Thirteen  Weights  for  a 
(2,  3,  l)-Network. _ 


1st  layer = 


3-3  0 
3  3  0 
1  0  0 


2nd  layer  =  [l  1  -2  O] 


{xi,X2)-^{ui,U2,U2)-^y^y 


ui=p{3xi-3x2) 


“2  =  p(3JCi  +  3X2 ) 


M3  =  p(xi) 


y  =  ai+«2-2«3 


y=piy) 


Letting  AffCL"*)  denote  the  number  of  affine  regions  possible  in  an  L'*-network, 
without  squashing  in  the  output  space,  it  can  be  shown  that 


A#(d,L,l)  =  ^  2^ 


k 


:0<k<d) 


Thus,  for  fixed  d,  Aff(d,L,l)  is  a  polynomial  of  degree  d  in  L.  This  formula  is  a 
generalization  of  the  formula  for  the  number  of  convex  regions  determined  by  L 
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hyperplanes  in  general  position  in  (see  Reference  7).  The  regions  enumerated  above 
are  determined  by  L  pairs  of  parallel  hypeiplanes  in  RW. 
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Henceforth,  we  denote  members  of  the  input  space  by  Xj  =  (xi,i,  X2,i, .... 
outputs  from  the  first  hidden  layer  by  ui  =  (U14,  y24, yL,iF.  and  members  of  the  output 

space  by  yi  =  (yi,i,  y2,i, ym.i)^-  For  the  remainder  of  this  section  we  discuss  only 
(dd-,ni)-networks. 

We  let  f :  R(L)  denote  the  mapping  x  u  realized  at  the  output  of  the  hidden 

layer.  The  jth  coordinate  of  the  output  of  the  hidden  layer  defines  a  mapping 

fj :  R(d)  ->  R(l).  This  mapping  is  determined  by  the  jth  row  of  the  first  weight  matrix  A. 
Let  aj  denote  the  L-vector  determined  by  the  first  L  coordinates  of  the  jth  row  of  A,  and 
let  aj  be  the  (L+l)st  coordinate  in  the  jth  row  of  A.  Then 
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This  function  is  piecewise  affine  with  three  regions: 


-1  forxeH^ 

fj(x)  =  ajX  +  aj  forxeHo-  . 
1  forx£H+ 


H.,  H+  are  the  half-spaces  where  ajx  +  aj  is  less  than  -1,  greater  than  +1,  respectively,  and 
Ho  is  the  'slab'  between  them.  One  can  consider  Ae  single  neuron  mapping  as  projecting 
the  slab  onto  an  interval  and  the  half-spaces  onto  its  endpoints. 

The  following  lemma  illustrates  how  the  (INT)  relation  can  be  altered  by  PLN 
mappings.  Two  neurons  are  sufficient  for  switching  points  between  Ext(X)  and  Int(X)  at 
the  hidden  layer.  We  let  U  denote  the  set  f(X)  of  N  outputs  at  the  hidden  layer.  Lemma  1 
says  that  an  interior  point  xi  of  X  can  become  an  exterior  point  ui  of  U,  while 
guaranteeing  that  any  additional  d-1  points  can  be  placed  in  the  interior  of  U.  It  should 
be  noted  that  three  points  are  required  in  Ext(U),  since  u-space  is  two-dimensional.  This 
means  that  X  must  have  at  least  d+2  points. 


LEMMA  1 

Suppose  X  =  {xi,  X2,  Xd+2}  is  a  (d+2)-set  in  general  position  in  RW  and 

XI '  Int(X).  Then  there  exist  weights  for  a  (d,2,m)-PLN  for  which  f(xi)  Ext(U)  and 
f(xi) '  Int(lJ),  2  ^  i  <  d.  That  is,  in  two-dimensional  u-space,  the  output  of  the  hidden 
layer,  Ext(U)  is  the  triangle  {f(xi),  f(xd+i),  f(xd+2)} 

Outline  of  Proof 

Let 


X'  =  {xi,  X2, ...,  xd)  . 

The  mapping  f :  R(^)  — >  R^^^  is  defined  by 


f(x)  =  (fi(x),f2(x))T 


where 


fj(x)  =  p(ajx  +  ttj)  ,  l<j<2  . 


The  (d-l)-simplex  generated  by  X'  lies  in  a  unique  hyperplane  Gq.  Since  xi '  Int(X),  Go 
separates  xd+i  and  xd+2.  Let  Gj  denote  the  hyperplane  through  xd+j,  which  is  parallel  to 
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Go,  and  let  gj  be  the  affine  functional  which  is  -1  on  Go  and  1  on  Gj,  for  j  =  1,2.  Then  the 
images  of  the  xi's  under  the  mapping  g  :  x  ^  (glfx),  g2(x))’^  =  v  are  given  by 

(1,-1)^  fori  =  d  +  1 
V,'  =  g{xi )  =  (-1,-1)^  for i<i<d 
fori  =  d  +  2 


The  desired  mapping  f,  which  must  place  each  ui,  2  <  i  <  d,  inside  the  triangle  formed  by 
ui,  Ud+i,  and  ud+2,  is  obtained  by  perturbing  the  mapping  g.  In  so  doing  the  points  xi,  2  < 
i  <  d,  are  cluster^  at  (-1,  -1)  inside  the  triangle. 

The  complete  proof,  including  the  algebraic  details  of  the  construction  of  f,  is 
presented  in  Appendix  A. 


4.  ORDER-MODIFYING  MAPPINGS 


In  this  section  two  theorems  are  proved  and  two  examples  of  PLN  mappings  are 
constructed.  The  theorems  pertain  to  (d,d,d)  and  (d,d,d,d)  PLNs.  Theorem  2  establishes 
a  new  upper  bound  on  Ninax(d,d,d),  while  Theorem  3  constructs  (d,d,d,d)  deformations  of 
2d+l  points  in  R(4),  which  cannot  be  realized  by  (d,  d,  d)  networks.  The  two  examples 
are  included  to  illustrate  how  planar  order  relations  can  be  modified  by  (2,2,2,2)  PLNs. 

Order  is  a  fundamental  algebraic  and  geometric  concept.  One  of  the  better  known 
linearly  ordered  sets  is  Rfi)  with  the  usual  'less  than’  order  relation  denoted  <.  As  was 

pointed  out  in  Section  2,  the  mapping  capabilities  of  PLN  networks  arise  in  a 

fundamental  way  from  destruction  of  the  (INT)  relationship  in  finite  subsets  of  Euclidean 
spaces  RW.  For  d=l  the  (INT)  relationship  is  based  upon  <  For  x  a  member  of  a  finite 
subset  X  of  R(^),  X  '  Int(X)  if,  and  only  if,  min(X)  <  x  <  max(X).  This  dependence  of 
(INT)  upon  order  in  R(I)  suggests  the  possibility  of  generalizations  to  higher  dimensions. 
In  this  section  the  notion  of  order  will  be  generalized  from  Rf^)  to  as  developed  in 
Reference  13. 

A  partially  ordered  set  (poset)  is  a  set  X  endowed  with  a  partial  order  P  satisfying 
(Ord  1)  X  P  X 

(Ord  2)  if  X  P  y  and  y  P  x,  then  x  =  y 
(Ord  3)  if  X  P  y  and  y  P  z,  then  x  P  z. 
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A  linear  order  satisfies  the  additional  requirement 


(Ord  4)  for  all  x  and  y,  either  x  P  y  or  y  P  x. 


An  example  of  a  poset,  which  is  not  linearly  ordered,  is  with  the  order  P ,  defined  by: 


(xi,  X2)  P(yi,  yi)  whenever  xi  <  yi  and  X2  <  y  2  • 


In  this  ordering  the  points  (0,1)  and  (1,0)  are  not  comparable  so  (Ord  4)  does  not  hold. 

Redefining  the  <  order  in  R(l),  using  signs  lead  to  the  following  natural  algebraic 
definition  of  higher  dimensional  order. 

Suppose  T  =  (xi,  X2, ...,  xd+i)  is  a  (d+l)-tuple  in  R(<1),  then  T  is  called  negative, 
degenerate  or  positive  depending  upon  the  value  of  the  determinant  of  M(T),  where 


1 


M(T)  = 


1 


Xi  X2 


1 


Xd+1 


Kd+l)Xid+l) 


T  is  negative  if  det(M(T))  is  negative 
T  is  degenerate  if  det(M(T))  is  zero 
T  is  positive  if  det(M(T))  is  positive. 


For  d  =  1, 

T  =  {xi,X2) 

1  1  ■ 

Xi  X2_ 

(iet(M(T))  =  X2-xi 

Thus,  (xi,  X2)  is  positive  provided  xi  <  X2  as  desired. 


MiT)  = 
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The  three  subfamilies  into  which  the  family  of  (d+l)-tuples  from  is 
decomposed  by  generalized  order  can  be  characterized  as  follows.  The  negative  and 
positive  (d+l)-tuples  tire  oriented  (ordered)  d-simplexes.  The  degenerate  (d+l)-tuples 
arise  from  sets  which  are  not  in  general  position  in  Rl*^).  That  is,  T  is  degenerate  provided 
there  exists  some  hyperplane  in  R(‘l),  which  contains  the  set  {xi,  X2, Xd+i ).  It  should 
be  noted  that  permuting  the  coordinates  of  a  degenerate  T  preserves  degeneracy,  while 
non-degenerate  (d+l)-tuples  alternate  between  negative  and  positive  when  pairs  xi,  xj  are 
interchanged. 

In  R(I)  the  pair  T  =  (xi,  X2)  is  positive  whenever  one  passes  to  the  right  (in  the  usual 
graphic  representation  of  the  real  line)  when  going  from  xi  to  X2.  Likewise  in  R(2),  the 
triple  T  =  (xi,  X2,  X3)  is  positive  whenever  one  moves  in  a  counter-clockwise  direction 
about  C  when  passing  from  xi  to  X2  to  X3  to  xi,  where  C  =  Conv({xi,  X2,  X3}). 

Generalized  order  can  be  employed  to  categorize  finite  subsets  in  R^*!).  Given  a 
k-subset  X  =  {xi,  X2,  ...,  xk)  of  each  (d-i-l)-subset  receives  a  label  -,  0,  or  + 
depending  upon  its  order.  Of  course,  these  labels  change  if  the  subscripts  on  the  x's  are 
changed.  For  a  fixed  labeling  of  the  x’s  one  obtains  a  mapping  from  the  family  of 
(d+l)-subsets  of  X  to  the  set  {-,  0,  +}.  The  equivalence  classes  of  mappings  which  are 
invariant  under  permutation  of  the  k  subscripts  are  called  the  unlabeled  d-dimensional 
order  types  (Reference  13).  The  INT  relation  can  also  be  utilized  to  define  order  by 

assigning  the  symbol  Y  or  N  to  the  pair  (x,  S)  when  x  '  Int(S)  or  x  €  Int(S),  respectively 
for  all  X  '  X  and  S  £  X. 

The  following  two  theorems  relate  PLN  mapping  capabilities  to  generalized  order 
properties  of  sets  of  inputs  and  outputs.  The  basic  idea  is  that  mapping  capabilities  as 
measured  by  NmaxCL*)  are  related  to  the  extent  that  order  can  be  jumbled  by  an  L* 
network  mapping.  Theorem  1  says  that  one  cannot  quite  turn  a  particular  (2d+l)-subset 
of  RW  inside  out  with  a  (d,d,d)-PLN.  On  the  other  hand  Theorem  2  demonstrates  a  way 
to  do  this  with  a  (d,d,d,d)-PLN. 


THEOREM  1.  Nn,ax(<lA<I)  ^2d 

Proof.  It  is  sufficient  to  exhibit  a  set  of  2d+l  input/output  pairs  that  cannot  be 
accommodated  by  a  (d,d,d)-PLN.  The  following  is  such  a  set  Let  xi,  X2, ...,  x^+i  be  the 
d-f-1  vertices  of  a  d-simplex  S  in  the  interior  of  l(‘*),  and  let  xd+2,  xd+3, ...,  X2d+i  be  d 
points  chosen  in  the  interior  of  S  so  that  the  entire  set  X  =  {xi,  X2, ...,  X2d+i }  is  in  general 
position.  The  outputs,  which  also  lie  in  R(d),  form  a  permutation  of  the  inputs. 
Specifically 


Xi 

for 

i  =  \ 

for 

2<i<d+\ 

Xi-d 

for 

d+2<i<2d+\ 
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The  d  interior  points  of  X  must  be  interchanged  with  d  of  the  vertices  (extreme  points)  of 
X.  This  amounts  to  nearly  turning  the  simplex  inside  out.  We  now  show  that  this  is  not 
possible  with  one  hidden  layer. 

Consider  the  values  of  the  2d+l  points  in  a  fixed  coordinate  (neuron)  of  the  hidden 
layer  (u-space).  At  least  two  of  the  d+1  exterior  inputs  of  X  must  assume  extreme  values 
at  the  fixed  u-coordinate.  Since  the  mapping  from  the  hidden  layer  to  the  output  space  is 
1-1  on  the  set  of  interest,  all  exterior  points  of  the  convex  hull  of  the  image  of  X  in 
u-space  must  also  be  extreme  points  in  Ae  output  space.  In  particular,  at  least  two  of  the 
exterior  inputs  must  be  vertices  in  the  output  set.  Since  only  one  of  the  input  vertces 
goes  to  an  output  vertex,  namely,  xi,  a  contradiction  arises.  Thus,  no  mapping  sending  xi 
into  yi,  for  1  <  i  ^  2d+l,  exists. 

The  following  two  examples  of  (2,2,2,2)-PLN  mappings  illustrate  how  a  second 
hidden  layer  can  facilitate  order  modification.  Table  4  and  Figure  4  show  the  5  sets  of  X 
and  Y  inputs  and  outputs,  respectively,  for  Example  4.  Ext(X)  =  (xi,  X2,  X3)  with  X4  and 
X5  lying  inside  the  2-simplex.  The  line  joining  X4  and  X5  cuts  the  faces  {xi,  X3}  and  {x2, 
X3}  of  the  2-simplex.  The  output  set  Y  also  consists  of  a  2-simplex  {yi,  y2,  ysl.  Yi  =  xi.  1 
<  i  <  3,  with  two  interior  points  {y4,  ys}-  The  line  joining  y4  and  ys  also  cuts  faces  {yi, 
y3}  and  {y2,  ys).  However  the  triples  {xi,  X2,  X3)  and  {x3,  X4,  X5}  have  the  same  sign, 
while  {yi,  y2,  ys)  and  {y3,  y4,  ys)  have  opposite  signs. 


TABLE  4.  Inputs  and  Outputs  for  Example  4. 


i 

yj 

1 

(-0.5000,  -0.5000) 

(-0.5000,  -0.5000) 

2 

(0.5000,  -0.5000) 

(0.5000,  -0.5000) 

3 

(0.0000, 0.5000) 

(0.0000, 0.5000) 

4 

(-0.1625,  -0.1250) 

(0.2724,  -0.1696) 

5 

(0.2250,  -0.2500) 

(-0.1427,  -0.2143) 
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FIGURE  4.  Five  Inputs  and  Five  Outputs 
for  Example  4. 


Table  5  and  Figure  5  show  the  input  and  output  sets  of  Example  5.  The  labeled 
order  type  (pattern  of  signs)  of  X  is  the  same  as  for  Example  4.  The  outputs  for  Example 
5  are,  however,  different;  Ext(Y)  =  {ys,  y4,  ys)  with  yi  and  yi  lying  inside  the  simplex. 
As  in  Example  4,  the  triples  {xi,  X2,  X3}  and  {x3,  X4,  X5}  have  the  same  sign;  while  {yi, 
y2,  y3}  and  {y3,  y4,  ys)  have  opposite  signs.  The  interior  line  through  yi  and  yi  cuts  the 
faces  {y3,  y4)  and  {y4,  ys). 


TABLE  5.  Inputs  and  Outputs  for  Example  5. 


(0.5000,  -0.5000) 
(0.0000, 0.5000) 
(-0.1000,  -0.1000) 
(0.1000,  -0.1000) 


(0.0334,  -0.3808) 
(0.0000, 0.5000) 
(-0.4999,  -0.5000) 
(0.5000,  -0.5000) 
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FIGURE  5.  Five  Inputs  and  Five  Outputs  for  Example  5. 

Tables  6  and  7  contain  the  weight  matrices  for  the  two  examples;  while  Tables  8  and 
9  show  the  u-space  and  v-space  outputs  at  the  hidden  layers. 
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TABLE  6.  Three  Weight  Matrices 
for  Example  4. _ 


'1.4286 

-1.2857 

-0.3571' 

0.7619 

-3.3524 

-0.2952 

'0.5833 

-1.1667 

0.4167' 

3.1500 

^.3750 

2.2250 

■-0.4286  0.6786 
0.8571  -0.3571 


-0.2500' 

0.0000 


TABLE  7.  Three  Weight  Matrices 
for  Example  5. _ 


^1 


'5.000  -1.6667  0.0000' 
2.2727  0.4545  0.1818 


'2.4545 

3.1818 


-2.5455  0.3485" 
-3.2727  0.2273 


-2.0706  2.3206 
-2.5810  2.0810 


0.2500' 

0.0000 


TABLE  8.  Intermediate  Outputs  for  Example  4 


1 

2 

3 

4 

5 


(-0.4286, 1.0000) 
(1.0000, 1.0000) 
(-1.0000,-1.0000) 
(-0.4285, 0.0000) 
(0.2858, 0.7143) 


(-1.0000,-1.0000) 
(-0.1667, 1.0000) 
(1.0000, 1.0000) 
(0.1668, 0.8752) 
(-0.2500, 0.0002) 
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TABLE  9.  Intermediate  Out] 

puts  for  Example  5. 

T 

T 

i 

“i 

1 

(-1.0000,  -1.0000) 

(0.4395,0.3182) 

2 

(1.0000, 1.0000) 

(0.2575, 0.1364) 

3 

(-0.8334,0.4091) 

(-1.0000,  -1.0000) 

4 

(-0.3333,  -0.0909) 

(-0.2382,  -0.5357) 

5 

fO.6667, 0.3636) 

(1.0000, 1.0000) 

Example  5  utilizes  input/output  pairs,  which  for  d  =  2,  are  similar  to  the  sets 
employed  in  the  proof  of  Theorem  2.  The  argument  of  Theorem  2  applies  to  Example  5. 
Thus,  the  action  of  the  (2,2,2,2)-PLN  mapping  on  X,  of  Example  5,  cannot  be  reahzed 
with  a  (2,2,2)-PLN. 

The  following  theorem  shows  that  generalizations  of  the  (2,2,2,2)  mapping  of 
Example  5  exist  for  all  (d,d,d,d)-PLNs. 


THEOREM  2 

Suppose  X  is  a  (2d+l)-set  in  general  position  in  R(d),  X  =  S  j  T,  ISl  =  d,  IT  =  d+l,  ^d 
S  is  a  facet  in  Ext(X).  Suppose  further  that  no  Une  joining  two  points  of  T  is  pw^el  to 
the  hyperplane  through  S.  Then  there  is  a  weight  assignment  W  for  a  (d,d,d,d)-PLN  tor 
which  Fw(S)  =  Int(Fw(X)),  Fw(J)  =  Ext(Fw(X)),  and  Iw(X)  lies  in  the  interior  of  WK 

Remarks 

This  theorem  says  that  the  set  X,  consisting  of  a  d-simplex  and  d-interior  points,  can 
almost  be  turned  inside  out.  That  is,  in  the  output  space  iW,  d  of  the  exterior  points 
become  interior  while  the  d  interior  points  become  exterior.  The  purpose  of  placing  the 
output  set  within  the  interior  of  I(<1)  is  to  achieve  the  result  without  benefit  of  the 
squashing  function  at  the  output  layer.  It  should  be  emphasized  that  Theorem  2  dws  not 
sav  that  any  mapping  between  (2d+l)-sets  X  and  Y,  each  consisting  of  a  d-simplex  and 
d-interior  points,  can  be  achieved  by  a  (2,2,2,2)-PLN.  The  theorem  only  guarantees  the 
certain  Ys  can  be  achieved,  which  cannot  be  handled  with  (2,2,2)-PLNs. 


5.  SUMMARY 


The  feed-forward  layered  neural  network  has  great  potential  for  fast  computation  of 
discriminant  functions  and  other  transformations  required  in  image  processing  and  pattern 
recognition.  Network  design  and  weight  assignment  are  two  of  the  important  tasks 
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arising  in  neural  network  applications.  The  results  presented  here  pertain  to  mapping 
construction  and  capabilities  for  layered  networks  with  piecewise  linear  neuron  transfer 
function. 

The  main  focus  of  this  work  is  two-fold.  First  it  is  shown  that  a  certain  type  of 
mapping  in  d-dimensional  Euclidean  space  cannot  be  achieved  by  a  (d,d*d)-PLN 
(piecewise  linear  network).  The  mapping  of  interest  involves  turning  the  simplex  inside 
out  in  Euclidean  d-space.  It  is  then  shown  that  such  mappings  can  be  achieved  by  a 
(d,d,d,d)-PLN.  The  importance  of  these  results  lies  in  the  methodology  of  the  proofs  as 
well  as  the  construction  techniques,  rather  than  in  the  treatment  of  the  particular  mapping 
in  d-space.  It  is  also  shown  that  two  hidden  neurons  are  sufficient  for  moving  a  point 
from  the  interior  of  a  set  to  its  exterior.  It  is  this  ability  to  disturb  the  order  properties  of 
Euclidean  sets,  which  fosters  the  mapping  complexity  of  piecewise  linear  networks. 
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Appendix  A 


PROOF  OF  LEMMA  1 


LEMMA  1 

Suppose  X  =  {xi,  X2,  xd+2}  is  a  (d+2)-set  in  general  position  in  and 

xi  '  Int(X).  Then  there  exist  weights  for  a  (d,2,m)-PLN  for  which  f(xi)  Ext(U)  ^d 
f(xi) '  Int(U),  2  <  i  <  d.  That  is,  in  two-dimensional  u-space,  the  output  of  the  hidden 
layer,  Ext(U),  is  the  triangle  {f(xi),  f(xd+i),  f(x<i+2)}- 

Proof 


Let 


X  —  {xj,  X2,  •••»  Xd}  • 

The  mapping  f :  R^^^  is  defined  by 


f(x)  =  (fi(x),  f2(x))T 


where 


fj(x)  =  p(ajx  +  ttj) ,  1  <  j  <  2 . 


The  desired  mapping  f  must  place  each  ui,  2  <  i  <  d,  inside  the  triangle  formed  by  ud+i, 
ui,  and  ud+2.  where 


ud+i  =  (1.  -1)^ 


ui  =  (-1,  -1)^ 

ud+2  =  (-1. 1)^  • 
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For  the  sake  of  clarity  f  will  be  defined  algebraically. 

There  exists  a  unique  unit  vector  c'  and  scalar  d  satisfying 

<  0  for  i  =  d  +  \ 
z'i  =  c'  Xi+d^Q)for\<i<d  - 
>  0  for  i  =  d+2 


Next  let  CQ  =  yc'  and  do  =  Yd',  where 


y=l/min[-2'^+l.2'^+2] 

This  gives 

=  -l-5i  fori  =  d+l 
Zi  =  CqXi  +dQ=0  forl<i<d  > 
=  1  +  ^2  fori  =  d+2 


where  6i  ^  0  and  62^  0.  There  exists  a  neighborhood  Nq  of  cq  satisfying  the  following: 
for  all  c  '  No, 

Icxi  -  coxil  <  K  for  1  <  i  <  d+2 , 

where 

K  =  |min[|coXi  -  cox^+i| ,  |coXi  -  CQXd+2^  . 


This  choice  of  No  guarantees  the  existence  of  c" '  Nq  satisfying 

c%+i  <  <• . .  <  c%  <  c^x^+i  • 

There  also  exists  a  neighborhood  N  of  c",  which  is  contained  in  No,  and  satisfies 
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cxd+\  <  cxi  <cx2<.-<  cxd  <  cxa+2 


for  all  c  '  N.  Finally  we  chose  two  linearly  independent  vectors  ci,  C2  in  N.  The  vectors 
aj  and  the  scalars  aj,  j  =  1,2,  are  determined  by  the  Cj,  0  <  j  <  2,  and  constants  aj,  Xj, 
j  =  l,2.  In  particular 


and 


aj  =  ajco  +  xjcj 


aj  =  Cjdo  - 1  -  'CjCjXi  . 


For  all  choices  of  oj,  xj,  j  =  1,2, 


ajxi  +  aj  =  -1  , 
and 

ajxi  +  aj  =  -1  +  XjCj(xi  -  xi)  ,  for  2  <  i  <  d. 
Next  we  let 


e 

'^~Cj{Xd-Xi) 


;  =  1,2 


where  0  <  '  <  1.  This  guarantees  that 


-1  <  ajxi  +  aj  <  -1  +  '  <  0 

for  j  - 1,2,  and  2  <  i  <  d. 

For  the  remaining  points  x<i+i,  xd+2,  we  have 
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ajxd+i  +  aj  =  -1  -  aj(l  +  6i)  -  xj  cj  (xi*  xd+i) 

ajXd+2  +  ttj  =  -1  +  aj(l  +  62)  +  Xj  Cj  (xd+2-  xi) 

forj  =  1,2. 

The  values  of  Oi,  02,  are  chosen  so  as  to  move  ajxi  +  aj,  j  =  1,2,  i  =  d+1,  d+2,  beyond  the 
thresholds  -1,  +1: 

ai  =  -  max  [Mi,  M2] 

02  =  max  [M3, 0] 

where 

_2  +  TiCi(xi-x^^i) 

^  l  +  3i 

M  -  '^M^d+2-Xl) 

1  +  52 

2-T2C2(Xrf+2-^l) 

1+52 

With  these  assignments  of  aj,  aj,  the  following  inequalities  hold 
aixd+i  +  ai  >  1  ,  a2Xd+i  +  02  ^  -1  , 

aiXd+2  +  oti^-l  .  a2Xd+2  +  a2^1  . 

The  two-dimensional  outputs  ui  at  the  hidden  layer  are  given  by 
Ui  =  (ui,i,  U24)’^  , 

where 


30 


NAWCWPNS  TP  8217  Revision  1 


Uj4  =  p(ajxi  +  aj)  . 


Table  A-1  shows  the  coordinates  of  the  Ui,  1  <  i  <  d+2. 

The  choice  of '  is  critical  in  positioning  the  Ui,  2  <  i  <  d.  For  0  <  '  <  2,  all  Ui  lie  inside 
the  square  with  vertices  (±1,  ±1).  In  order  to  guarantee  that  the  Ui  lie  in  the  triangle 
formed  by  Ud+i,  ui,  Udf2»  one  must  also  require  '  <  1.  As  '  approaches  0,  the  ui  all 
approach  ui. 


TABLE  A-1.  Coordinates  of  d+2  Points  in  the  u-Plane. 


i 

Ui.i 

U2.i 

1 

-1 

-1 

2 

-i+'u 

-1+'2.2 

3 

-i+'u 

-l+'2,3 

» 

d-1 

-l+'l,d-l 

-l+"2,d-l 

d 

-1+' 

-1+' 

d+1 

1 

-1 

d+2 

-1 

1 

V 

V 
o 

for  j  =  1^ 
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Appendix  B 


PROOF  OF  THEOREM  2 


Throughout  this  appendix  we  assume  that  d  >  2.  Lemma  2  establishes  the  following 
useful  fact;  the  minimum  member  of  a  set  of  d+2  real  numbers  can  be  placed  anywhere 
insidf.  the  convex  hull  (d-simplex)  of  the  remaining  d+1  members  by  a  (l,d,d)-PLN. 


LEMMA  2 

Let  V  denote  the  d-simplex  in  with  vertices  Vi,  2  <  i  <  d+2 ,  where 


V2=(-l,-l,-l,... -l-l/ 

V3=a-1.-1 . -1,-lf 

V4=ai,-l,...,-l,-lf 

V5  =  (1,1,1,... -1. -if 


V4+i=  (1,1,1 . 1,-lf 

V4+2=(l,l.l....,l,lf 


and  let  VI  be  any  point  inside  V.  If  zi,  Z2, ...,  Zd+2  are  real  numbers  satisfying 


zi  <  Z2  < ...  <  Zd+2  , 


then  there  exist  weights  for  a  (l,d,d)-PLN  which  map  Zi  into  vj,  1  <  i  <  d+2. 
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Proof 

The  first  layer  weights  aj,  aj  are  given  by 


fli  = 


2 

Z2-Z1 


«!  = 


Z1+Z2 

21 -Z2 


aj  = 


2j+2  -  2l 


\for  2<j<d 


21+2j+2 

2l-2y+2 


The  mapping  Zi-^  ui  from  input  space  to  the  ouqjut  of  the  hidden  layer  is  defined  by 
Uj4  =  p(ajzi  +  aj) , 


giving 
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«1  =  (-1,-1, -1,-1,..., -1,-1)^ 


T 

«2  =  (1,«2,2»*^3,2>“4,2»— 

T 

M3  =  (l»M23»M33»“4,3»-’“d-13’“^/3) 

T 

M4  =  (1»1»W3^4, 1/4^4, 


W5  — 


r 


(i»  i»  i 


^(i+2  “■ 


From  the  monotinicity  of  the  zj's  it  also  follows  that 


-l<W;,2<«;.3<«y,4<- 


<1 


for  2  <  j  <  d,  i.e.  all  rows  of  the  matrix  of  ui’s  are  monotone. 

In  order  to  realize  the  specific  positions  of  I's  and  -I's  in  the  ui's,  the  first  layer 
weights  are  uniquely  determined  as  shown  above.  Slight  perturbations  of  the  zi's,  which 

preserve  montonicity,  will  result  in  slight  perturbations  of  the  aj's  and  aj's.  These 
perturbed  weights  produce  the  same  pattern  of  +rs  and  -Ts,  while  perturbing  the 
remaining  uj^'s  slightly. 

The  second  layer  weights  bjjc,  Pj,  are  given  by 


^,1 


^  2u2^+2 
M2,2-M23 
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&12  - - 

“2.3 -“2,2 

o  ,  .  1  ,  2u2,2-2 

^=-l+-Vii+-- - - 

2  “2,2 -“2,3 


.  ,1  4  1 

ft,  1  =1 - Vj  1 - 


ft:  ,  = - 


}  for2<j<d 


and  all  other  bj^s  are  zero.  The  images  of  the  Uj’s  under  the  affine  mapping  u  — >  Bu  +  P, 
before  squashing,  are  shown  below,  where 


fl  =  [ft?',i^,...,ftjf  = 

bj  =  [fty,ifty,2  •  •  ■  •  >  ] 

and 

P=(ft.ft . 


ftyMl+^y=Vyj/orl<;<d 


h 

bd 


ftl“2+A=-2 

^“3+A  =  2 
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b[Ui+fil  -  2  + 


4(1-«2.3),  ^ 

“2,3  ““2.2 

for  A<i<d  +  2 


<-2 
for2<  j^d 


bjUi+Pj=-2  + 


<-2 


/orz<  j+1 
and  2<j<d 


bjUi+Pj 


4(“;,-l) 

+  — ^  =  2 

/ori  ^  J+2 
and  2^  j^d 


After  squashing  (application  of  the  function  p)  we  have  p(Bui  +  P)  -  vi.  It  is 
important  to  note  that  the  values  before  squashing  satisfy 

|hy“,+^^|>2 


for  all  j  when  i  >  2.  This  is  helpful  when  considering  small  perturbations  in  the  data. 
Suppose  that  Ui'  lies  in  a  small  neighborhood  of  Ui,  for  2  <  i  <  d+2.  Then  p(Bui'  +  P)  =  v,, 

for  all  i.  Moreover,  if  ui'  is  close  to  ui,  then  vi’  =  p(Bui'  +  P)  will  lie  in  a  small 
neighborhood  of  vi.  Indeed  a  sufficiendy  small  neightorhood  of  ui  can  be  mapped  into  a 
neighborhood  of  vj,  which  lies  in  the  interior  of  V. 
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THEOREM  2 

Suppose  X  is  a  (2d+l)-set  in  general  position  in  RW,  X  =  S  u  T,  ISI  =  d, 
ITI  =  d+1,  and  S  is  a  facet  in  Ext(X).  Suppose  further  that  no  line  joining  two  points  of  T 
is  parallel  to  the  hyperplane  through  S.  Then  there  is  a  weight  assignment  W  for  a 
(d,d,d4)-PLN  for  which  Fw(S)  lies  inside  the  interior  of  Conv(Fw(T)).  Moreover  Fw(T) 
lies  in  the  interior  of  the  unit  cube. 

Outline  of  Proof 

The  proof  employs  an  intermediate  set  of  weights  (A(0),  a(0),  B,  P,  Id,  Od)  as  well 

the  final  weights  W  =  (A,  a,  B,  P,  C,  y).  Here  Id  is  the  d  by  d  identity  matrix  and  Od  is  the 
d- vector  of  zeroes.  The  first  set  maps  T  to  the  d-simplex  V  while  mapping  all  of  S  to  the 
single  point  vi  inside  V.  The  first  layer  weights  (A(0),  a(0))  are  perturbed  slightly  to 

obtain  (A,  a).  The  two-layer  mapping  (A,  a,  B,  P)  also  sends  T  to  V  while  mapping  S 
into  a  cluster  of  points  in  a  small  neighborhood  of  vi  lying  entirely  within  the  interior  of 
V.  The  third  layer  weights  (C,  y)  just  map  the  simplex  V  into  the  interior  of  [-1,  !](**),  so 
that  squashing  at  the  output  layer  is  irrelevant.  The  inequalities  satisfied  by  bjUi  +  Pj 
allow  the  use  of  the  second  layer  weights  (B,  p)  in  both  mappings. 

The  linear  functional  x  hx,  which  is  constant  on  S,  maps  Xi  into  the  The 
first  layer  weights  (A(0),  a(0))  are  then  determined  by  the  vector  z^j, ...,  . 

These  determine  the  Uj^^s  which,  together  with  vi  determine  the  second  layer  (B,  p). 
IIBII,  vi,  and  the  z^^'s  are  used  to  define  a  small  neighborhood  T)  i  of  h  in  the  boundary 
9Sph  of  the  unit  sphere  Sph  in  R4),  A  suitable  h(^)  is  selected  in  T|i  which  defines  the 
mapping  xj  The  Zj^^s  in  turn  determine  a  neighborhood  TI2  of  h(^)  in  9Sph. 

Finally  a  basis  hi,  h2, ...,  hd  of  vectors  is  chosen  from  T|2.  These  functionals  are  used  to 
define  the  first  layer  (A,  a)  of  weights. 

Proof 

We  let  S  =  {xi,  X2, ...,  xd)  and  T  =  {xd+i,  xd+2, ...,  X2d+i}-  There  exists  a  unique 
hyperplane  H  through  S.  Since  S  is  a  facet  of  X,  the  set  T  is  not  separated  by  H.  Thus, 
there  exists  a  unique  unit  vector  h(0)  and  a  unique  scalar  z  satisfying 


/if^^Xi>zi/d+l<i<2d+l  . 
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Letting  =  h(0)xi,  we  have 


for  d+1  <  i  <  2d+l.  Moreover,  since  no  line  through  two  members  of  T  is  parallel  to  H, 
the  zj^^'s,  must  be  distinct,  for  d+1  <  i  <  2d+l.  Therefore,  after  relabeling  (if  necessary), 
we  have 


,(0)<,(0)  (0) 

<  ^d+l  <  ^d+2  ^ 


■<z. 


(0) 

2d+l 


The  z^^^'s  play  the  role  of  the  zi's  in  the  preceding  Lemma,  after  setting  Zi  = 
1  <  i  <  d+2. 

The  intermediate  first  layer  weights  (A(®),  a(®))  are  given  by 


where 


^(0)^  ^(0)^(0) 


A<»>  = 


(0)  0  0 
0 


0 
0 
0 


0  0  0  0 


0 

0 

0 


\dXd 


and 
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^0)' 

f,(0) 

1.(0) 


\dXd 


The  and  a^^’s  are  given  by 


1  ",(0)_,(0) 

^d+i  h 


~(0)-  h  ^^d+\ 
1  “  ,(0)_,(0) 
h  ^d+l 


JO) _ ^ 

“7<1)  Z7^ 

^j+d+l  h 


j  ~  ,(0)  _  ,(0) 

h  ^j+d+l 


\  for  2<j^d 


The  outputs  for  1  >  j  <  d  and  d  <  i  <  2d+l,  are  defined  by 


and  take  the  following  form. 
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4%  =  ('’4?]+2'4!li+2' ■■••4-W+2’4?J+2) 

4%  =  (l.l-4?i+3’  ■■■  '4-\^+3>4^+3) 

=(i.u,-,«a,^,4%r 

45,.;=  (1,1,1,  ■■•,i,if 

The  outputs  of  the  jth  neuron  are  monotonic: 

-1  <  uf}^^  <  43+2  <  4?^+3  <  •  ••  <  4S+y  <  > 


The  second  layer  of  weights  (B,  P)  is  given  by 


;8  =  (A,&,...,fef 

where 
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1  ,  .  2-4% +2 


*>U=-l-2''U+  m)  _  (0) 

^  “2,d+l  “2,d+2 


^’2-  ro)  _(0) 

“2,d+2  “2.d+l 


1  —  2 
“2,d+l  “2,d+2 


o  _  1  .  1 ..  .  ^“2,d+l  " 

A--i  +  2n.i  +  ;(oy 


-1  1  4 

2^’' 


7.^+7 


^j,d+j 


>  for  2  <  j<d 


and  all  other  bj,k's  are  zero.  It  should  be  noted  that  the  expressions  for  the  bjjc  and  are 
identical  to  those  in  Lemma  2  with  each  uj,i  replaced  by  • 

As  in  Lemma  2  images  of  the  under  the  mapping  u  ->  Bu  +  P,  before 
squashing,  satisfy 


for  1  <  j  <  d  and  d+1  <  i  <  2d+l. 
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The  neighborhood  T|  1  depends  upon  the  following  constants: 


^1  = 


ry 

2i5||Vd 


K2  =  max 
j 


K3  = 


=  min[l,i?v] 


4®ic  =  max 


,(0) 


l<j<2d+l 


where  Rv  is  the  radius  of  a  sphere  centered  at  vi,  which  is  contained  entirely  within  the 
simplex  V,  and  IIBII  is  the  norm  of  the  matrix  B. 

We  let  Til  be  a  neighborhood  of  h(0)  in  8Sph  satsifying  the  following 


for  all  h'erii 


andl<i<2d+\. 


where  5i  is  the  minimum  of  the  five  quantities 
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2 


1 

4K2 


-minjzj^®] - :d<i< 2dj 


Since  iii  contains  h(0),  which  maps  the  set  S  into  the  single  point  z,  T|i  must  contain  some 
h(^)  satisfying 


,(1)  ^  ,(i) 


,(1) 


where  z^^  =  h(^)xi.  From  the  constraints  on  6i  we  have,  for  d  <  i  <  2d, 


.(1)  _ ,  (i)_  _(i)  _  ,(0)  .  _(0)  _  _(0)  .  .(0)  _  .(1) 
^i+l  ^  i  -  ^i+l  +  ^i+1 


N+1  ^i+1 


z(o)-za) 

t  i 


> min|z,^®] - :d<i< 2dJ -25i 
>imin[z;°j-z;°)]>0  . 


This  guarantees  that  monotonicity  of  the  z^^'s  is  maintained,  i.e. 
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The  neighborhood  112  of  h(l)  in  9Sph  is  chosen  so  that 

\h'xi  -  <  S2  for  all  h'erji 

and  1  <  i  <  2d  + 1  , 

where  62  is  the  minimum  of  the  five  quantities 
1-^ 

Ki-Si 

_ 5, 

2K2 

K^-Si 

|min[z^\-zP:l<i<2d]  , 

and  we  let  83  =  81  +  52.  The  constraints  on  82  and  h(l)  guarantee  that 

(*  1)  h'xi+i  -  h'Xi  >  z^\  -  zP>  -  252 

>  0  for  all  h'e7\2  ond  l<i<2d  . 


Thus,  monotonicity  of  the  h'xi's  is  maintained  for  all  h'  e  TI2. 

Finally  we  choose  a  basis  hi,  h2, hd  ft’om  TI2.  Letting  zj^  =  hjXi,  for  1  <  j  <  d,  and 
1  <  i  <  2d+l,  the  final  set  of  first  layer  wieghts  (A,  a)  is  given  by 


45 


NAWCWPNS  TP  8217  Revision  1 


A  =  Ai 


a  =  (ai,  a2, ad)'^ 


where 


r  flj  0  0  0  •  •  •  0 


0  fl2  0  0  ^ 

0  0  ^3  0  •  •  •  0 


0  0 


0  0  ••• 


\dXd 


Hi 


\ 

h 


dXd  . 


The  ai'S  and  aj-s  are  given  by 
2 

<h= - 

^l.d+l  ^1.1 


a,= 


_  ^1.1  ^1. 


d+l 


^1,1  \d+\ 


2 

^j,i+d+\  ~  ^i,\ 


>•  for  2<j<d  . 


Q.  _  ^7.1  ^jj+d+l 
^  ^7,1  “  ^jj+d+l , 


The  outputs  Ujj,  for  1  <  j  <  d  and  1  <  i  <  2d+l,  are  defined  by 
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and  take  the  following  form 

ui  =  (-1,-1 -1,- -1-1)^ 

M2  =  (-1  +  2) 

“d  =  (“1  +  +  ^2,d~^  +  ^d,d 

“d+1  =  (l.«2,d+l>“3,d+l’‘“’“d-W+l’“d.d+l) 

“d+2  =  (l>“2,d+2>“3.d+2>’“’“d-U+2>“d,d+2) 

7 

^d+3  ~  (^>^’^3,</+3’' l,d+3’^d,t/+3) 

Md+4  ~  (l>l>l’'‘'’^d-l,d+4’**<i,d+4) 

7 

^2d  (ij  1>  1  j " '  5 1’  ^d,2d  ) 

“2d+l  “  • 

In  order  to  prove  the  theorem  it  must  be  shown  that 

(*2)  ||p(fiM,-  +  P) -vil<Ryforl<i<d  , 
and 

(*3)  p{Bui  +p)  =  p{Buf">  +  for  d  + 1  <  j  <  2d  + 1 . 
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The  first  inequality  establishes  the  clustering  of  the  points  in  S  about  vi,  while  the  second 
shows  that  the  ui's  map  into  the  same  simplex  V  as  do  the  u-^^'s,  d+1  <  i  <  2d+l.  To  this 
end  we  define  the  following  quantities  that  bound  the  changes  in  the  outputs  between  the 
mappings  defined  by  (A(0),  and  (A,  a). 


Dfl  =  max] 


=  max 


:\<j<d 


D^^  =  max  Ujji  -  :1  <  y  <  rf,  and  1  <  i  <  2d  + 1 


D^^  =  maxi 


:l<j< d, and  1  < i  < d 


Invoking  the  upper  bounds  imposed  on  6i,  and  82  the  following  inequalities  can  be 
proved. 

(♦4)  Da<25sKi 
(*5)  D^<28^KhZ. 

(*6)  D^<5^K^[\  +  25-iK2^AK2Z^^)<^ 

(*7)  <  283/5:2(1  +  283/5:2)  <  2D„ 
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Proof  of  (*4). 

For  1  <  j  <  d  we  have 


^  ^(0) 

2 

2 

aj-a)> 

zy  j+4+1  ~  ^y. 

1  ^y+4+i  h 

=  2 


(^f+d+i  - 


(^jj+d+i  -  ^j,\ )[^f+d+\  ^1^^) 

2^3 


\l,(fi)  ,(0)\ 

(2y,y+4+i-^y,i 

H^y+d+i  h  j 

<2 

[^i,j+d+\  ~ 

4^3 

^ _ 4£j _ 

2Vj+d-H  ^1  A  7+‘'+l  1  / 

8^1 


z(0)  .joiy 
v+4+1  h  ) 


853 

f,(0)  _/0)f 

1^4+1  ) 


<-^^  =  153KI 

^IkI  ^ 
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Proof  of  (*5). 

For  1  <  j  <  d  we  have 


+  ^jJ+d+\ 

/O)  (0) 

+  ^7+d+l 

^7.1  “  ^j,j+d+\ 

,(0)  _  .(0) 
^;+d+i 

-27.l4+d+l| 

(^7.1  “  ^jJ+d+lj 
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Proof  of  (*6). 

For  1  <  j  <  d  and  1  <  i  ^  2d+l 


<Da  + 


<Da  + 


z--#) 


+  ZV 


(0) 


-  ■*■  (^2  ^a)^3  ^max^a 


<  253^7^  +  K25^  +  2532/i:|  +  2zSL5^Ki 
=  ^3^:2  (l  +  252K2  +  4^r2Z^  )  =  • 
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Proof  of  (*7). 

For  1  <  j  <  d  and  1  <  i  <  d 


l“w  + “j) 


a, 


2-  -2^0) 


(K2  +  0^)253  =  2«3(*:2  +  2SiK^) 


=  2«3A:2(I  +  253*r2)  <  2(5, <  2D„ 

^3 
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Considering  (*2)  we  have,  for  1  <  i  <  d. 


\p{Bui  +P)-vi\\  =  \\p{BUi  +P)-  p{Bui  +  p)\\ 


<  +P)-  {Bui  +  ^)|| 


<||B|lVdD' 


UiA  ^3 . 


V  — 


since  83  =  6i  +  62  ^  K3. 
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Proceeding  with  (*3)  we  have,  for  d+1  <  i  <  2d+l, 

||p(fi«,.  +0)-  p{Buf^  +  p)||  <  |b(«,  -  «!“>  I 

<||b1V3d„ 

J  r,  YB,Ari 

U'fiA  ^3 

IK,  2 

since  83  =  6i  +  82  ^  K3,  and  ry  ^  1. 

Remark 

No  weight  assignment  for  a  (d,d,d)-PLN  can  effect  the  mapping  guaranteed  by  this 
theorem  when  IExt(X)l  =  d+1.  In  this  case  d  of  the  d+1  members  of  T  must  be  in  Int^). 
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